Reducing Migration-induced Misses in an over-Subscribed Multiprocessor System

نویسندگان

Sajjid Reza

Gregory T. Byrd

چکیده

REZA, SAJJID. Reducing Migration-induced Misses in an Over-Subscribed Multiprocessor System. (Under the direction of Prof. Gregory T. Byrd.) In a large multiprocessor server platform, using multicore chips, the scheduler often migrates a scheduling entity, i.e. a thread or process or virtual machine, in order to achieve better load balancing or ensure fairness among competing scheduling entities. Similar to a context switch, each such migration incurs overhead, from saving and restoring processor states and virtual machine control structure, extra Translation Look aside Buffer (TLB) misses and related page walks, cache misses, and interrupt rerouting. Such migration impact is likely to be more severe in virtualized environments, where high over-subscription of CPUs is very common for server consolidation workloads or virtual desktop infrastructure deployment causing frequent migrations and context switches. One way to mitigate such overhead would be to constrain a thread or virtual CPU to be executed on a specific CPU only. However, this approach will likely cause load imbalance, and would still suffer from context switch related misses. Furthermore, effectively determining and managing such constrain assignments of threads to CPUs would not be a trivial task and is unrealistic particularly in a large data center with thousands of servers running varieties of workloads. Alternatively, if we could effectively preserve important footprints (both cache and TLB) and reload them with migration of the process, we could avoid the expensive misses. We characterized the effectiveness of saving and restoring TLB cache entries over a context switch/migration and we found that 60-100% of TLB cache misses could be avoided. We also present two predictors to select TLB entries that are most likely to be reused. After comprehensive evolution, we found that both predictors could reduce storage footprints by 20-80% for most workloads while keeping the false negative rate very low. We also characterized the effectiveness of preserving cache in particular L2 where working set of a process/VM resides, in reducing the migration-induced cache misses. We propose three different MRU based schemes (local, regional and global) to select cache tags to be prefetched following a migration. After comprehensive evolution we found that the simple MRU (most recently used) based selection can provide similar benefit as more complex and costly (in terms of hardware implementation) global or regional MRU schemes and can provide performance benefits of 1.5% -27% reduction in CPI (cycles per instruction) following a migration. We also compared our migration prefetcher with standard hardware stream-based and stride-based prefetchers and found that our migration prefetcher performs almost always better. In some cases, combining the migration prefetcher with a standard hardware prefetcher can provide significant further improvements. © Copyright 2012 by Sajjid Reza All Rights Reserved Reducing Migration-induced Misses in an Over-Subscribed Multiprocessor System

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulation study of memory performance of SMP multiprocessors running a TPC-W workload

The infrastructure to support electronic commerce is one of the areas where more processing power is needed. A multiprocessor system can offer advantages for running electronic commerce applications. The memory performance of an electronic commerce server, i.e. a system running electronic commerce applications, is evaluated in the case of shared-bus multiprocessor architecture. The software arc...

متن کامل

Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor

For a parallel architecture to scale eeectively, communication latency between processors must be avoided. We have found that the source of a large number of avoidable cache misses is the use of hardwired write-invalidate coherency protocols, which often exhibit high cache miss rates due to excessive invalidations and subsequent reloading of shared data. In the Avalanche project at the Universi...

متن کامل

A Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study

This paper introduces an analytical method for approximating the performability of a firm realtime system modeled by a multi-server queue. The service discipline in the queue is earliestdeadline- first (EDF), which is an optimal scheduling algorithm. Real-time jobs with exponentially distributed relative deadlines arrive according to a Poisson process. All jobs have deadlines until the end of s...

متن کامل

Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor

Directory-based, write-invalidate cache coherence protocols are effective in reducing latencies to the memory but suffer from cache misses due to coherence actions. It is therefore important to understand the nature of data sharing causing misses for this class of protocols. In this paper we identify a set of parameters that char-acterises the accesses to migratory and producer-consumer data in...

متن کامل

Mixed-Criticality Multiprocessor Real-Time Systems: Energy Consumption vs Deadline Misses

Designing mixed criticality real-time systems raises numerous challenges. In particular, reducing their energy consumption while enforcing their schedulability is yet an open research topic. To address this issue, our approach exploits the ability of tasks with low-criticality levels to cope with deadline misses. On multiprocessor systems, our scheduling algorithm handles tasks with high-critic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Parallel Processing Letters

دوره 23 شماره

صفحات -

تاریخ انتشار 2013

Reducing Migration-induced Misses in an over-Subscribed Multiprocessor System

نویسندگان

چکیده

منابع مشابه

Simulation study of memory performance of SMP multiprocessors running a TPC-W workload

Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor Reducing Consistency Traac and Cache Misses in the Avalanche Multiprocessor

A Multiprocessor System with Non-Preemptive Earliest-Deadline-First Scheduling Policy: A Performability Study

Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor

Mixed-Criticality Multiprocessor Real-Time Systems: Energy Consumption vs Deadline Misses

عنوان ژورنال:

اشتراک گذاری